Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering
نویسندگان
چکیده
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on concept size of given objects and/or clusters described using fixed number equal probability bin-rectangles. In each step clustering, we agglomerate so as to minimize compactness generated cluster. means that plays role similarity measure between be merged. Minimizing is equivalent maximizing dis-similarity cluster, i.e., concept, against whole in step. this sense, cluster quality. also show average with respect several clustering steps useful effectiveness criterion. Features having small are mutually covariate and able detect geometrically thin structure embedded obtain thorough understandings data via visualization dendrograms scatter diagrams selected informative features. illustrate proposed by artificial set real sets.
منابع مشابه
Histogram Clustering for Unsupervised
This paper introduces a novel statistical mixture model for probabilistic grouping of distributional (histogram) data. Adopting the Bayesian framework, we propose to perform annealed maximum a posteriori estimation to compute optimal clustering solutions. In order to accelerate the optimization process, an e cient multiscale formulation is developed. We present a prototypical application of thi...
متن کاملExploiting Hierarchical Structures for Unsupervised Feature Selection
Feature selection has been proven to be effective and efficient in preparing high-dimensional data for many mining and learning tasks. Features of real-world high-dimensional data such as words of documents, pixels of images and genes of microarray data, usually present inherent hierarchical structures. In a hierarchical structure, features could share certain properties. Such information has b...
متن کاملDissimilarity measures for histogram-valued data and divisive clustering of symbolic objects
Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different numbe...
متن کاملHierarchical and Pyramidal Clustering for Symbolic Data
This paper presents a method for clustering a set of symbolic data where individuals are described by symbolic variables of various types: interval, categorical multi-valued or modal variables, which take into account the variability or uncertainty present in the data. Hierarchical and pyramidal clustering models are considered. The constructed clusters correspond to concepts, that is, they are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Stats
سال: 2021
ISSN: ['2571-905X']
DOI: https://doi.org/10.3390/stats4020024